News Archive

SDSC’s ‘Sherlock’ Partners with UC Office of the President to Deliver Data Platform in AWS Cloud

Serverless data management system eliminates costs, strengthens security and more

Published January 25, 2022

 Credit: rawpixel, 123RF; SDSC External Relations

The Sherlock Division at the San Diego Supercomputer Center (SDSC), the University of California Office of the President’s (UCOP) risk and technology delivery services groups and Kwartile partnered to successfully re-architect and migrate the UCOP Risk Services Data Management System (RDMS 1.0) from an on-premise, Hadoop-based platform to a serverless, data lake platform in the Amazon Web Services (AWS) Cloud (RDMS 2.0).

The 18-month production deployment of RDMS 2.0 – a result of the strong and dedicated collaborative effort among SDSC’s Sherlock, UCOP Risk Technology Services, UCOP Technology Delivery Services (TDS) and Kwartile – enabled the team to efficiently meet its goal and recently deliver the Cloud-based RDMS 2.0.

The initial RDMS 1.0 service was conceptualized in 2015 and hosted within Sherlock’s secure enclave at SDSC as a Hadoop-based data platform. As the project and its needs evolved, the natural progression of RDMS 1.0 was to refactor it to a commercial Cloud to allow for the adoption and integration of new Cloud-based technologies and services that would modernize the data platform, yield significant cost savings, enhance security and improve scalability. Driven by these goals, the team decided to undertake a Proof of Value (POV) effort that validated the feasibility and benefits of the technical approach while securing the necessary buy-in from the stakeholders. This was followed by a longer, more detailed project engagement to perform the full migration of the current platform to the new Cloud-based solution.

“This was an excellent collaboration and a well-coordinated effort between the various teams supporting the RDMS transition from Sherlock’s on-premise tenant to its AWS cloud enclave. Due to license renewal constraints, the project was completed within an accelerated timeline. This project is a win-win for the executive sponsor, resulting in both the modernization of the data platform and cost savings achieved through eliminating licensing and other operational costs.” said Nilofeur Samuel, director of Risk Technology Services at UCOP.

Sherlock and its partners’ overall objective for RDMS 2.0 was to create a well-architected solution that focused on delivering value to customers while addressing the following key attributes:

Modernization, Scalability, Reliability and Performance

Adopt a Cloud-native, serverless data management stack that leverages the high availability and performance of AWS Cloud including:

  • Data stored as objects in AWS’s affordable, highly reliable and scalable data store service (S3)
  • AWS Glue, a dynamic compute services that extract, transform and load (ETL) data for use by business intelligence reporting
  • Use of Athena, a serverless, pay-as-you-go, interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL

Security

A defense-in-depth strategy was employed to secure data including:

  • Separate environments for production and non-production data
  • De-identified data in non-production environments
  • Custom encryption at-rest per data environment
  • Role-based, fine-grained access control to tables and columns in risk data store
  • Data versioning and replication

Cost Savings

The migration from RDMS 1.0 to RDMS 2.0 is projected to save the program approximately $2M over the next five years. These cost savings are primarily achieved by reducing licensing costs, eliminating large capital investment in physical hardware and realizing efficiencies in staffing resulting from the move from on-premise to the Cloud. Specifically:

  • RDMS 1.0 used proprietary licensed software, Cloudera, running on fixed infrastructure
  • RDMS 2.0 runs as AWS Cloud services with a pay-as-you-go model

“As custodians of systemwide data for the university, it is incumbent upon us to continuously explore options for managing data securely, more economically and with greater flexibility and scalability. In recent years, the offerings by commercial cloud service providers, such as AWS, have become viable options for managing data that are congruent with the aforementioned tenets of our mission. Through a strong partnership between Sherlock, Kwartile and UCOP Technology Delivery Services, we were able to leverage the core competencies of each team toward a successful implementation of a modern, cloud-based and highly secure data management platform that could serve as an all-encompassing, strategic and forward-looking approach to data management,” said Hooman Pejman, data architect at UCOP. “In my view, the key to our success was our collective diligence in exploring, identifying, selecting and orchestrating the appropriate services offered by AWS, based on a serverless architecture and a pay-as-you-go model." 

Kwartile’s data engineering solutions provided automated tools for Data and Metadata migration and helped update and optimize the data curation jobs to run on AWS cloud native services. “These migration tools provided a comparison report of source and target, which improved data quality and significantly reduced data validation time. Projects of this nature are complex and would not have been successful without the proper collaboration and technology expertise provided by Sherlock and UCOP teams. This was a true team effort from everyone involved, and an outcome of our long-standing partnership,” said Krishna Katikaneni from Kwartile.

According to Sandeep Chandra, executive director of Sherlock Cloud at SDSC, while the individual Cloud services are reliable, the real work is in the orchestration and configuration of these services which are sufficiently complex that no human could correctly and reliably maintain their state. “Sherlock provided a platform that allowed the team to define infrastructure as code with automated deployments based on changes to a shared code base, including manual approval processes as gate keepers to control configuration changes. This assures the solution is repeatable, auditable, can be rolled back to a previous state and can easily adapt to frequent incremental change,” he explained. “This re-usable infrastructure as code paradigm allows Sherlock to use the same building blocks and processes adapted and customized to the specific needs of different projects across various engagements.”

About SDSC’s Sherlock Division

SDSC’s Sherlock Division focuses on providing innovative, secure information technology and data services for academia, and state and federal government agencies. It is an SDSC Center of Excellence for secure HIPAA- and FISMA-compliant managed Cloud hosting, and recently added NIST CUI- and CSF-compliant managed Cloud hosting to its offerings. Launched under the brand Sherlock, its major services – Cloud, Compliance, Cybersecurity, and Data Lab – provide a secure foundation for a wide range of research and data collection initiatives. The Sherlock Division supports a variety of entities including the Centers for Medicare and Medicaid Services (CMS), National Institutes of Health (NIH), and University of California Systems. For more information please visit the Sherlock website.

About SDSC

SDSC, located at UC San Diego, is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services and expertise to the national research community, including industry and academia. Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences and biology to astrophysics, bioinformatics and health IT.